library(tidyverse)
library(readxl)
library(janitor)
library(lubridate)
library(stringr)
library(tidycensus)
library(tigris)
library(sf)
library(ggplot2)
library(ranger)
library(tidymodels)
library(Metrics)
library(vip)
# Silence messages from tidycensus
options(tidycensus.quiet = TRUE)Predicting Household SNAP Participation Rates by County in the Census South Region
Data Science for Public Policy Final Project
Background and Literature Review
The Supplemental Nutrition Assistance Program (SNAP, formerly known as Food Stamps) provides nutritional support for households with low incomes across the United States. As one of the premier means-tested benefit programs provided by the federal government, it is a crucial part of the U.S. social safety net, lifting 3.4 million people out of poverty in 2023 (Creamer and King, 2024). Understanding the factors that contribute to SNAP participation can help researchers and policymakers determine strategies to ensure that families can receive the benefits they need. To that end, this project aims to build a model to predict SNAP household participation rates at the county level using data on county- and state-level demographic, economic, political, and policy characteristics for the 17 states in the Census South region: Delaware, District of Columbia, Florida, Georgia, Maryland, North Carolina, South Carolina, Virginia, West Virginia, Alabama, Kentucky, Mississippi, Tennessee, Arkansas, Louisiana, Oklahoma, and Texas.
Past research has identified numerous factors that influence SNAP participation rates. Negative economic conditions, such as high unemployment and poverty rates, are strongly correlated with SNAP participation, as are various demographic characteristics such as disability status, immigrant status, and lower educational levels (Pinard et al, 2017). Age is also an important factor, with eligible seniors enrolling in SNAP at much lower rates than eligible non-seniors (Jones et al, 2021).
Policy conditions have been found to play a significant role in determining SNAP participation. Broad-Based Categorical Eligibility (BBCE) is a policy that expands SNAP eligibility to households which qualify for Temporary Assistance for Needy Families (TANF) or a state maintenance of effort (MOE) funded benefit (FNS, n.d.a). States may choose to implement BBCE, effectively allowing them to raise the SNAP gross income limit to up to 200% of the federal poverty line and raise or eliminate the asset limit. Dickert-Conlin et al (2021) and Han (2016) find that states’ adoption of BBCE had significant positive effects on SNAP caseloads. Able-bodied adults without dependents (ABAWDs) are subject to work requirements to maintain SNAP eligibility. If an ABAWD is not working longer than three months in any three-year period, they lose eligibility for their benefits; however, states may request that this time limit be waived for part or all of the state, effectively eliminating work requirements (FNS, n.d.b). Dickert-Conlin et al (2021) find that work requirements are negatively associated with SNAP participation.
Data Sources
Packages for reading data and further analysis
SNAP Participation Rates
The numerator is taken from SNAP data tables published by the Food and Nutrition Service, specifically the Bi-Annual State Project Area/County Level Participation and Issuance Data published in a .zip file at this url: https://www.fns.usda.gov/pd/supplemental-nutrition-assistance-program-snap. The .zip file contains a set of excel files with counts of individuals and households participating in SNAP by county in January and July for each year from 1984 to 2023. We use a subset of those years, 2017-2019 and 2023. We choose to skip 2020-2022 because of the dramatically altered economic and policy environment during the COVID-19 pandemic which would likely make data from those years less applicable to other years. From this data, we take the average of the counts of households participating in SNAP in January and July, the two times of year this data is updated. A subset of counties are missing from the FNS dataset, so for those counties, we substitute data from the American Community Survey (ACS) 5-year estimates; table B22001 includes a variable for the number of households receiving SNAP benefits.
We take the denominator of the SNAP participation rate from the ACS B22001 table as well, as it also contains the total number of households by county. For the counties missing from the FNS dataset, we inflate the ACS data for each county since SNAP participation rates are often significantly underreported in household surveys such as the ACS (Rothbaum et al, 2021). We calculate the inflation factor at the state level, taking the average number of households receiving SNAP benefits in each state year from another FNS dataset (the .zip file under “National and/or State Level Monthly and/or Annual Data” from the url above) and dividing it by the number of households reported as receiving SNAP benefits in each state year in the ACS. For each county whose count of SNAP households came from the ACS rather than FNS, we multiply the household SNAP participation rate by the inflation factor to achieve a more accurate estimate.
counties_filepath <- "data/snap-zip-fns388a-4"
counties_filelist <- list.files(path = counties_filepath, pattern = "\\.xlsx?$", full.names = TRUE)# function to read in SNAP county data files downloaded from FNS
read_snap_file <- function(file_path) {
snap_df <- read_excel(file_path, skip = 3) |> clean_names()
snap_df <- snap_df %>% select(-matches("^\\.\\.\\.|^x\\d+$")) # drop empty columns
file_name <- basename(file_path)
file_month <- str_extract(file_name, "^[A-Za-z]+(?=\\s*\\d{4})")
year <- str_extract(file_name, "\\d{4}(?=\\.)") %>% as.numeric()
snap_df <- snap_df |>
mutate(
month = case_when(
file_month == "JUL" ~ 7,
file_month == "Jul" ~ 7,
file_month == "July" ~ 7,
TRUE ~ 1
),
year = year,
state_code = str_sub(substate_region, 1, 2),
county_code = str_sub(substate_region, 3, 5),
GEOID = str_sub(substate_region, 1, 5)
) %>%
select(
year, month, GEOID, state_code, county_code,
calc_snap_total_pa_and_non_pa_households
) |>
group_by(year, month, state_code, county_code, GEOID) |>
summarize(fns_snap_households = sum(calc_snap_total_pa_and_non_pa_households)) |>
filter(!str_detect(state_code, "\\D"))
return(snap_df)
}fns_snap_county <- map_dfr(counties_filelist, possibly(read_snap_file, otherwise = NULL)) %>%
filter(year %in% c(2015,2016,2017,2018,2019,2023)) |>
group_by(year, GEOID, state_code, county_code) |>
summarize(fns_snap_households = round(mean(fns_snap_households)))# function to read in SNAP state totals downloaded from FNS
read_state_totals <- function(file_path) {
filename <- basename(file_path)
FY <- as.numeric(str_extract(filename, "(?<=FY)\\d{2}"))
FY <- 2000 + FY
if (FY > 2019) {
rows_list <- list(
NERO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
MARO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A98:B110", "A113:B125"),
SERO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110", "A113:B125"),
MWRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
SWRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
MPRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110", "A113:B125"),
WRO = c("A8:B20", "A38:B50", "A68:B80", "A83:B95", "A98:B110", "A113:B125", "A128:B140"))
} else if (FY %in% c(2015, 2016)) {
rows_list <- list(
NERO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
MARO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A85:B97", "A115:B127"),
SERO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110", "A113:B125"),
MWRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95"),
SWRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80"),
MPRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110", "A113:B125", "A128:B140", "A143:B155"),
WRO = c("A8:B20", "A25:B37", "A40:B52", "A70:B82", "A85:B97", "A100:B112", "A115:B127", "A130:B142"))
} else {
rows_list <- list(
NERO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
MARO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A85:B97", "A100:B112"),
SERO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110", "A113:B125"),
MWRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
SWRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110"),
MPRO = c("A8:B20", "A23:B35", "A38:B50", "A53:B65", "A68:B80", "A83:B95", "A98:B110", "A113:B125"),
WRO = c("A8:B20", "A25:B37", "A55:B67", "A70:B82", "A85:B97", "A100:B112", "A115:B127"))
}
map_dfr(names(rows_list), function(sheet_name) {
ranges <- rows_list[[sheet_name]]
map_dfr(ranges, function(range) {
df <- read_excel(file_path, sheet = sheet_name, range = range, col_names = FALSE)
df <- df |>
setNames(c("month", "households")) |>
mutate(
state_name = month[1],
year = str_extract(month, "20\\d{2}"),
month = str_sub(month, 1, 3)
) |>
filter(!is.na(households))
return(df)
})
})
}
state_filepath <- "data/snap_state_totals"
state_filelist <- list.files(path = state_filepath, pattern = "\\.xlsx?$", full.names = TRUE)
fns_state_totals <- map_dfr(state_filelist, read_state_totals) |>
filter(!year %in% c(2014, 2020, 2022, 2024)) |>
group_by(year, state_name) |>
summarize(fns_snap_households = round(mean(households)))# pull in SNAP data from ACS
acs_years <- list(2015, 2016, 2017, 2018, 2019, 2023)
acs_snap_county <- map_dfr(
acs_years,
~get_acs(
geography = "county",
variables = c(
total_households = "B22001_001",
acs_snap_households = "B22001_002"
),
year = .x,
survey = "acs5",
geometry = FALSE,
cache_table = TRUE
) |> mutate(year = .x)) |>
select(year, GEOID, NAME, variable, estimate) |>
pivot_wider(names_from = variable, values_from = estimate) |>
mutate(
acs_snap_rate = acs_snap_households / total_households,
state_fips = str_sub(GEOID, 1, 2)
) |>
filter(!state_fips %in% c("60", "66", "69", "72", "78"))
acs_state_totals <- acs_snap_county |>
group_by(year, state_fips) |>
summarize(acs_snap_households = sum(acs_snap_households))
state_fips <- fips_codes |>
select(state_code, state_name) |>
distinct(state_code, .keep_all = TRUE)
acs_inflators <- fns_state_totals |>
left_join(state_fips, by = "state_name") |>
mutate(year = as.numeric(year)) |>
left_join(acs_state_totals, by = c("state_code" = "state_fips", "year" = "year")) |>
group_by(year, state_code) |>
mutate(inflator = fns_snap_households/acs_snap_households) |>
select(year, state_name, state_code, inflator)
acs_snap_county_infl <- acs_snap_county |>
left_join(acs_inflators, by = c("state_fips" = "state_code", "year" = "year")) |>
mutate(acs_snap_rate_infl = acs_snap_rate*inflator)
snap_participation <- acs_snap_county_infl |>
full_join(fns_snap_county, by = c("GEOID", "year")) |>
mutate(fns_snap_rate = fns_snap_households/total_households,
snap_participation_rate = if_else(!is.na(fns_snap_rate), fns_snap_rate, acs_snap_rate_infl),
snap_rate_acs = if_else(!is.na(fns_snap_rate), 0, 1)) |>
select(year, GEOID, state_name, snap_participation_rate, snap_rate_acs, total_households)
us_counties <- counties(cb = TRUE) |>
filter(!STATEFP %in% c("60", "66", "69", "72", "78", "02", "15"))snap_map <- us_counties |>
left_join(snap_participation, by = "GEOID")
snap_map |>
filter(year == 2015) |>
ggplot() +
geom_sf(mapping = aes(fill = snap_participation_rate)) +
scale_fill_gradient(low = "#cfe8f3", high = "#062635") +
theme_void() +
labs(title = "Household SNAP Participation by County in the Contiguous 48 States")Demographic and Economic Characteristics
Census ACS Data
We also take demographic data from the ACS for the years 2015 to 2019 and 2023. Specifically, we take data for things like the number of white respondents, number of respondents that are renters, and the number of respondents in poverty, then divide them by the total number of people that were asked that question to calculate a percentage for each county. In all, we use ACS data to measure the percent male, median age, percent white, percent that are families, percent with a bachelor’s degree or more, percent in poverty, percent disabled, percent renting their home, median age, median household income, percent that are not citizens, and percent over 65 years old.
#Census Data
#######
census_vars <- tidycensus::load_variables(year = 2016,
dataset = c("acs5"))
#B06011_001E is median income
#B19013_001 is median household income in past 12 months (colinearity w/median income?)
#B01001_001E is total population
#B01001_002E is number of males
#B17001_002E is number below poverty line
#B01002_001 is median age (total)
#B03002_003 is total number of non-hispanic white (_001 is overall)
#B15003_017 to _025 is education attainment levels
#B11001_002 is number of family households (_001 is overall)
#B18101_001 is total number disabled status
#_004,_007, etc are number that are disabled for age/sex groups
#B25003_003 is total number of renters (_001 is overall)
#B22001_002 is total number of households that received food stamps in prior 12 months (_001 is overall)
#B05001_006 is total number of non-citizens (_001 is overall)
#B01001_020-25 and _044-049 is total over 65 (_001 is overall)
get_snap <- function(year) {
available_vars <- load_variables(year, "acs5", cache = TRUE)$name
snap_vars <- c("B06011_001E", "B01001_001E", "B01001_002E",
"B17001_002E", "B17001_001E", "B19013_001",
"B01002_001", "B03002_003", "B03002_001",
"B15003_001", "B15003_017", "B15003_018", "B15003_019",
"B15003_020", "B15003_021", "B15003_022",
"B15003_023", "B15003_024", "B15003_025",
"B11001_002", "B11001_001", "B25003_003",
"B25003_001", "B18101_001", "B22001_002", "B22001_001",
"B18101_004", "B18101_007", "B18101_010", "B18101_013",
"B18101_016", "B18101_019", "B18101_022", "B18101_025",
"B18101_028", "B18101_031", "B05001_006", "B05001_001",
"B01001_020", "B01001_021", "B01001_022", "B01001_023",
"B01001_024", "B01001_025", "B01001_044", "B01001_045",
"B01001_046", "B01001_047", "B01001_048", "B01001_049")
snap_vars_nosuffix <- gsub("E$", "", snap_vars)
usable_vars <- intersect(snap_vars_nosuffix, available_vars)
if (length(usable_vars) == 0) {
warning(paste0("No matching variables available for year ", year))
return(NULL)
}
census_demo <- get_acs(geography = "county",
variables = usable_vars,
year = year,
survey = "acs5",
output = "tidy",
progress_bar = FALSE) |>
mutate(year = year) |>
mutate(variable = case_when(
variable == "B06011_001" ~ "median_income_individual",
variable == "B01001_001" ~ "total_population",
variable == "B01001_002" ~ "male_population",
variable == "B17001_002" ~ "below_poverty",
variable == "B17001_001" ~ "poverty_total",
variable == "B19013_001" ~ "median_household_income",
variable == "B01002_001" ~ "median_age",
variable == "B03002_003" ~ "white_alone",
variable == "B03002_001" ~ "total_race_population",
variable == "B15003_001" ~ "total_educ",
variable == "B15003_017" ~ "hs_grad",
variable == "B15003_018" ~ "ged",
variable == "B15003_019" ~ "some_college_less_1yr",
variable == "B15003_020" ~ "some_college",
variable == "B15003_021" ~ "assoc_degree",
variable == "B15003_022" ~ "bachelors_degree",
variable == "B15003_023" ~ "masters_degree",
variable == "B15003_024" ~ "professional_degree",
variable == "B15003_025" ~ "doctoral_degree",
variable == "B11001_002" ~ "family_households",
variable == "B11001_001" ~ "total_households",
variable == "B25003_003" ~ "renter_occupied",
variable == "B25003_001" ~ "total_occupied_housing",
variable == "B18101_001" ~ "total_disability_universe",
variable == "B22001_002" ~ "snap_recipient_hh_prior12",
variable == "B22001_001" ~ "total_snap_universe",
variable == "B05001_006" ~ "non_citizens",
variable == "B05001_001" ~ "non_citizens_universe",
TRUE ~ variable # keep original name if no match
))
census_wide <- census_demo |>
select(GEOID, NAME, year, variable, estimate) |>
pivot_wider(names_from = "variable",
values_from = "estimate")
census_wide <- census_wide |>
mutate(male_pct = (male_population/total_population),
white_pct = (white_alone/total_race_population),
family_pct = (family_households/total_households),
bachelor_or_higher_pct = ((bachelors_degree + masters_degree +
professional_degree + doctoral_degree)/
total_educ),
poverty_pct = (below_poverty/poverty_total),
disabled_pct = ((B18101_004 + B18101_007 + B18101_010 + B18101_013 +
B18101_016 + B18101_019 + B18101_022 + B18101_025 +
B18101_028 + B18101_031)/total_disability_universe),
snap_pct = (snap_recipient_hh_prior12/total_snap_universe),
renter_pct = (renter_occupied/total_occupied_housing),
non_citizen_pct = (non_citizens/non_citizens_universe),
over_65_pct = ((B01001_020 + B01001_021 + B01001_022 + B01001_023 +
B01001_024 + B01001_025 + B01001_044 + B01001_045 +
B01001_046 + B01001_047 + B01001_048 + B01001_049)/
total_population)) |>
select(GEOID, NAME, year, male_pct, median_age, white_pct, family_pct, bachelor_or_higher_pct,
poverty_pct, disabled_pct, snap_pct, renter_pct, median_age,
median_income_individual, median_household_income,
snap_recipient_hh_prior12, non_citizen_pct, over_65_pct)
assign(paste0("snap_", year), census_wide, envir = .GlobalEnv)
}
years <- c(2015, 2016, 2017, 2018, 2019, 2023)
map(.x = years, .f = get_snap)
combined_census <- bind_rows(snap_2015, snap_2016, snap_2017, snap_2018,
snap_2019, snap_2023) |>
arrange(GEOID)USDA Rural-Urban Data
Another data source we use is the Rural Urban Continuum published by the USDA (available here: https://www.ers.usda.gov/data-products/rural-urban-continuum-codes). This data assigns a value to every county based on how rural (9 being the maximum) or urban (1 being the minimum) it is.
ruralurbancodes2013 <- read_excel("data/ruralurbancodes2013.xls")
ruralurbancodes2023 <- read_excel("data/Ruralurbancontinuumcodes2023.xlsx")
ruralurbancodes2013 <- ruralurbancodes2013 |>
rename(GEOID = FIPS)
ruralurbancodes2023 <- ruralurbancodes2023 |>
rename(GEOID = FIPS)
census_urban_rural <- combined_census |>
left_join(ruralurbancodes2013 |>
select(GEOID, RUCC_2013), by = "GEOID") |>
left_join(ruralurbancodes2023 |>
select(GEOID, RUCC_2023), by = "GEOID") |>
mutate(rural_urban_score = case_when(
year == 2023 ~ RUCC_2023,
TRUE ~ RUCC_2013)) |>
select(-RUCC_2013, -RUCC_2023)University of Kentucky Welfare Data
A third source of demographic data comes from the University of Kentucky’s Center for Poverty Research, who have published national welfare data (available here: https://cpr.uky.edu/resources/national-welfare-data). From this data, we can collect information for each state like the unemployment rate, percent of residents that have varying degrees of food insecurity, the gross state product, whether the governor is a Democrat and the percentage of state legislatures that are Democrats, and the minimum wage. The political data has some quirks resulting in missingness, like D.C. having no governor and Nebraska having a unicameral legislature. To address this, we manually impute the political party for D.C.’s mayor and apply Nebraska’s legislature data to both chambers.
ky_welfare <- read_excel("data/UKCPR_National_Welfare_Data_1980_2023.xlsx",
sheet = "Data",
col_types = c("text",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"text", "numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric",
"numeric", "numeric", "numeric")) |>
janitor::clean_names()
ky_welfare <- ky_welfare |>
filter(year %in% c(2015,2016,2017,2018,2019,2023)) |>
mutate(state_fips = str_pad(state_fips, width = 2, side = "left", pad = "0")) |>
select(state_fips, year, unemployment_rate, marginally_food_insecure, food_insecure,
very_low_food_secure, gross_state_product, governor_is_democrat_1_yes,
fraction_of_state_house_that_is_democrat, fraction_of_state_senate_that_is_democrat,
state_minimum_wage)
census_urban_rural <- census_urban_rural |>
mutate(state_fips = substr(GEOID, 1, 2))
demographic_combined <- census_urban_rural |>
left_join(ky_welfare, by = c("state_fips", "year"))
#Filtering out Puerto Rico
demographic_combined <- demographic_combined |>
filter(state_fips != 72)
#Manually imputing 1 to reflect DC's Democratic mayor each year
demographic_combined <- demographic_combined |>
mutate(governor_is_democrat_1_yes = ifelse(GEOID == "11001", 1, governor_is_democrat_1_yes))
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_house_that_is_democrat = ifelse(
GEOID == "11001", 1, fraction_of_state_house_that_is_democrat))
#Manually imputing Nebraska's unicameral makeup to both state house and senate pct
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_house_that_is_democrat = ifelse(
state_fips == "31" & year %in% c(2015, 2016), 12/49,
fraction_of_state_house_that_is_democrat))
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_senate_that_is_democrat = ifelse(
state_fips == "31" & year %in% c(2015, 2016), 12/49,
fraction_of_state_senate_that_is_democrat))
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_house_that_is_democrat = ifelse(
state_fips == "31" & year %in% c(2017, 2018, 2019), 15/49,
fraction_of_state_house_that_is_democrat))
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_senate_that_is_democrat = ifelse(
state_fips == "31" & year %in% c(2017, 2018, 2019), 15/49,
fraction_of_state_senate_that_is_democrat))
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_house_that_is_democrat = ifelse(
state_fips == "31" & year %in% c(2023), 17/49,
fraction_of_state_house_that_is_democrat))
demographic_combined <- demographic_combined |>
mutate(fraction_of_state_senate_that_is_democrat = ifelse(
state_fips == "31" & year %in% c(2023), 17/49,
fraction_of_state_senate_that_is_democrat))SNAP Policy Predictors
We include four variables covering state-level policies governing SNAP eligibility, three of which were constructed from the FNS’s SNAP Policy Database (available here: https://www.ers.usda.gov/data-products/snap-policy-data-sets) and one of which was entered manually. Specifically, our data contains: * A binary variable indicating whether a state implemented BBCE in a given year, taken directly from the SNAP Policy Database for 2015-2019. For 2023, this information was collected from FNS’s BBCE site (accessed for 2023 via the Wayback Machine here: https://web.archive.org/web/20230331232320/https://www.fns.usda.gov/snap/broad-based-categorical-eligibility). * A continuous variable that measures the state’s income limit for SNAP recipients as a percentage of the federal poverty line. For states that implement BBCE, the SNAP Policy Database contains this information for 2015-2019, with non-BBCE states coded as missing. We coded non-BBCE states’ as 130, because 130% is the default income limit for SNAP eligibility. For 2023, this information was collected from the FNS BBCE site linked above. * A categorical variable that equals 0 if states have the default federal asset limit for SNAP eligibility ($2,250 for most of the years in the sample), 1 if states have increased the asset limit via BBCE, and 2 if states have eliminated the asset limit entirely. This variable was constructed from the SNAP Policy Database for 2015-2019. For 2023, this information was collected from the FNS BBCE site linked above. * A categorical variable that equals 0 for states that have not waived the ABAWD work requirement time limit, 1 for states that have waived the time limit for part of the state, and 2 for states that have waived the time limit throughout the entire state. For 2015-2019, this data was collected by visually inspecting a map published by the Center on Budget and Policy Priorities showing areas covered by ABAWD time limit waivers for all 50 states (available here: https://www.cbpp.org/research/states-have-requested-waivers-from-snaps-time-limit-in-high-unemployment-areas-for-the-past). For DC, 2015-2019 ABAWD time limit waiver data was collected directly from FNS (available here: https://www.fns.usda.gov/snap/abawd/waivers/2015-2019). For 2023, this information was collected from the 2023 State Options Report (https://fns-prod.azureedge.us/sites/default/files/resource-files/snap-15th-state-options-report-october23.pdf).
poldb <- read_xlsx("data/SNAPPolicyDatabase.xlsx", sheet = "SNAP Policy Database")
abawd_waivers <- read_xlsx("data/SNAPPolicyDatabase.xlsx", sheet = "ABAWD Waivers")
poldb_2023 <- read_xlsx("data/SNAPPolicyDatabase.xlsx", sheet = "2023 Data")
mode <- function(x) {
u <- unique(x)
tab <- tabulate(match(x, u))
u[tab == max(tab)]
}
combined_poldb <- poldb |>
mutate(year = as.numeric(str_sub(as.character(yearmonth), 1, 4))) |>
group_by(year, state_fips) |>
summarize(bbce = mode(bbce),
bbce_asset = mode(bbce_asset),
bbce_inclmt = mode(bbce_inclmt)) |>
group_by(year,state_fips) |>
summarize(bbce = last(bbce),
bbce_asset = last(bbce_asset),
bbce_inclmt = last(bbce_inclmt)) |>
left_join(abawd_waivers, by = c("state_fips", "year")) |>
bind_rows(poldb_2023) |>
filter(year > 2014,
year != 2020) |>
mutate(state_fips = as.character(state_fips),
state_fips = case_when(
state_fips == "1" ~ "01",
state_fips == "2" ~ "02",
state_fips == "4" ~ "04",
state_fips == "5" ~ "05",
state_fips == "6" ~ "06",
state_fips == "8" ~ "08",
state_fips == "9" ~ "09",
TRUE ~ state_fips
))Data Wrangling and Exploratory Data Analysis
Merging datasets
snap_2model <- snap_participation |>
left_join(demographic_combined, by = c("GEOID", "year")) |>
left_join(combined_poldb, by = c("state_fips", "year")) |>
filter(snap_participation_rate < 1,
state_name %in% c("Alabama", "Arkansas", "Delaware", "District of Columbia", "Florida", "Georgia", "Louisiana",
"Kentucky", "Louisiana", "Maryland", "Mississippi", "North Carolina", "Oklahoma", "South Carolina",
"Tennessee", "Texas", "Virginia", "West Virginia")) |>
select(-c(median_income_individual, state_name, NAME, snap_recipient_hh_prior12, snap_pct)) |>
drop_na()
# Exclude counties that don't have data for all years
county_observations <- snap_2model |>
group_by(GEOID) |>
summarize(num_obs = n()) |>
filter(num_obs < 6)
snap_2model <- snap_2model |>
anti_join(county_observations, by = c("GEOID"))
# Remove extraneous dataframes to save memory
rm(abawd_waivers, acs_inflators, acs_snap_county, acs_snap_county_infl, acs_state_totals, acs_years, census_urban_rural,
census_vars, combined_census, combined_poldb, county_observations, demographic_combined, fns_snap_county, fns_state_totals,
ky_welfare, poldb, poldb_2023, ruralurbancodes2013, ruralurbancodes2023, snap_2015, snap_2016, snap_2017, snap_2018,
snap_2019, snap_2023, snap_participation, state_fips, us_counties, snap_map)Split up testing and training
We are building our predictive model using data from years 2017-2019 and 2023, training on 2017-2019 and testing on 2023. We do not use the years 2020-2022, as the COVID-19 pandemic dramatically altered the policy and economic landscape which could undermine the training of our model. We begin by splitting the data into training and testing sets, and then conducting exploratory data analysis.
snap_test <- snap_2model |>
filter(year==2023)
snap_train <- snap_2model |>
filter(year%in% c(2017, 2018, 2019)) |>
arrange(year)
snap_train |> group_by(year) |> summarize(n = n())# A tibble: 3 × 2
year n
<dbl> <int>
1 2017 1420
2 2018 1420
3 2019 1420
Exploratory Data Analysis
We begin by simply examining average county SNAP participation rates by year and by state (for 2019) within the training dataset.
snap_train |>
group_by(year) |>
summarize(year_avg_rate = mean(snap_participation_rate, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(x = year, y = year_avg_rate)) +
geom_line(size = 1.2, color = "darkblue") +
geom_point(color = "darkred") +
labs(
title = "Southern State Average SNAP Participation Rate by Year (2017–2019)",
x = "Year", y = "Average Participation Rate"
) +
scale_x_continuous(breaks = 2017:2019) +
theme_minimal()Average county SNAP participation rates decreased slightly from 2017 to 2019, from about 21.5% to about 19%. Note that these are averages of county rates, not the overall rate for all counties, so they weight each county equally.
snap_train |>
mutate(state_name = case_when(state_fips == "01" ~ "Alabama",
state_fips == "05" ~ "Arkansas",
state_fips == "10" ~ "Delaware",
state_fips == "11" ~ "District of Columbia",
state_fips == "12" ~ "Florida",
state_fips == "13" ~ "Georgia",
state_fips == "21" ~ "Kentucky",
state_fips == "22" ~ "Louisiana",
state_fips == "24" ~ "Maryland",
state_fips == "28" ~ "Mississippi",
state_fips == "37" ~ "North Carolina",
state_fips == "40" ~ "Oklahoma",
state_fips == "45" ~ "South Carolina",
state_fips == "47" ~ "Tennessee",
state_fips == "48" ~ "Texas",
state_fips == "51" ~ "Virginia",
state_fips == "54" ~ "West Virginia")) |>
filter(year == 2019) |>
group_by(state_name) |>
summarize(state_avg_rate = mean(snap_participation_rate, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(x = state_name, y = state_avg_rate)) +
geom_col() +
labs(
title = "Average SNAP Participation Rate by State 2019",
x = "", y = "Average Participation Rate"
) +
theme(axis.text.x = element_text(angle = 45, vjust = .75))Average SNAP participation rates vary significantly across states, ranging from Virginia at just below 15%, to its neighbor West Virginia at nearly 25%.
Now we investigate the relationship between SNAP participation rates and various county-level economic conditions.
snap_train |> filter(year == 2019) |>
ggplot(mapping = aes(x = median_household_income, y = snap_participation_rate)) +
geom_point(alpha = 0.25) +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth") +
labs(x = "Median Household Income", y = "SNAP Participation Rate", title = "SNAP Participation Rate by Household Income 2019") +
theme_minimal()snap_train |> filter(year == 2019) |>
ggplot(mapping = aes(x = poverty_pct, y = snap_participation_rate)) +
geom_point(alpha = 0.25) +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth") +
labs(x = "Poverty Rate", y = "SNAP Participation Rate", title = "SNAP Participation Rate by Poverty Rate 2019") +
theme_minimal()snap_train |> filter(year == 2019) |>
ggplot(mapping = aes(x = unemployment_rate, y = snap_participation_rate)) +
geom_point(alpha = 0.25) +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth") +
labs(x = "Unemployment Rate", y = "SNAP Participation Rate", title = "SNAP Participation Rate by Unemployment Rate 2019") +
theme_minimal()We see a strong negative association between SNAP participation rates and median household income, and a positive associations between SNAP participation rates and poverty and unemployment. The relationship between poverty and SNAP participation is especially strong, tightly hugging the line of best fit. These associations are not surprising, given SNAP’s status as a means-tested benefit program.
We now examine the relationship between SNAP and county demographic characteristics.
snap_train |> filter(year == 2019) |>
ggplot(mapping = aes(x = white_pct, y = snap_participation_rate)) +
geom_point(alpha = 0.25) +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth") +
labs(x = "Percentage White", y = "SNAP Participation Rate", title = "SNAP Participation Rate by County Racial Demographics 2019") +
theme_minimal()snap_train |> filter(year == 2019) |>
mutate(rural_urban_score = factor(rural_urban_score)) |>
group_by(rural_urban_score) |>
summarize(urban_avg_rate = mean(snap_participation_rate, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(x = rural_urban_score, y = urban_avg_rate)) +
geom_col() +
labs(
title = "Average SNAP Participation Rate by Urbanicity",
x = "Urban-Rural Score", y = "Average Participation Rate"
) +
theme_minimal()Counties with a higher percentage of white residents tend to participate less in SNAP. Urbanicity does not have a simple linear relationship with SNAP participation. The most urban counties (those with an urban-rural score of 1) have the lowest rates of SNAP participation, and the highest levels are seen in suburban counties, with the most rural counties (with a score of 9) in the middle.
Finally, we explore the relationship between SNAP participation and our selected policy variables.
snap_train |> filter(year == 2019) |>
group_by(bbce) |>
mutate(bbce = if_else(bbce == 0, "No", "Yes")) |>
summarize(bbce_avg_rate = mean(snap_participation_rate, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(x = bbce, y = bbce_avg_rate)) +
geom_col() +
labs(
title = "Average SNAP Participation Rate by BBCE Participation",
x = "BBCE", y = "Average Participation Rate"
) +
theme_minimal()snap_train |> filter(year == 2019) |>
mutate(bbce_inclmt = if_else(bbce_inclmt == -9, 130, bbce_inclmt),
bbce_inclmt = factor(bbce_inclmt)) |>
group_by(bbce_inclmt) |>
summarize(inclt_avg_rate = mean(snap_participation_rate, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(x = bbce_inclmt, y = inclt_avg_rate)) +
geom_col() +
labs(
title = "Average SNAP Participation Rate by Income Limit",
x = "SNAP Income Limit as Percent of Federal Poverty Line", y = "Average Participation Rate"
) +
theme_minimal()snap_train |> filter(year == 2019) |>
mutate(abawdwaiver= case_when(
abawdwaiver == 0 ~ "No waivers",
abawdwaiver == 1 ~ "Waivers Covering Part of State",
abawdwaiver == 2 ~ "Waivers for Entire State"
)) |>
group_by(abawdwaiver) |>
summarize(waiver_avg_rate = mean(snap_participation_rate, na.rm = TRUE), .groups = "drop") |>
ggplot(aes(x = abawdwaiver, y = waiver_avg_rate)) +
geom_col() +
labs(
title = "Average SNAP Participation Rate by Waiver Status",
x = "ABAWD Time Limit Waivers", y = "Average Participation Rate"
) +
theme_minimal()We don’t see any difference in SNAP participation rates between states who implement BBCE and those who don’t. Moreover, SNAP participation rates do not seem to have a simple linear association with income limits for eligibility. Perhaps there are other variables that confound the relationships between BBCE, the income limit, and SNAP participation, such as poverty. ABAWD time limit waivers, unsurprisingly, have a positive association with SNAP participation, suggesting that these waivers make it easier for people to receive SNAP benefits.
Data Analysis
Training models
Creating folds and a recipe
Now that we have explored our training data set, we will begin training our models. The first step is setting up resamples: for the three training years, we have two resamples: the first trains on 2017 and tests on 2018, and the second trains on 2018 and tests on 2019. The final fit will ultimately be tested on 2019, the closest year to our testing year.
# set up folds
snap_folds <- rolling_origin(snap_train, initial = 1420, assess = 1420, cumulative = FALSE)
snap_2019 <- snap_train |>
filter(year==2019)We then pre-process our data. We remove predictors with zero variance, update the BBCE-related variables as described in the data sources section above, convert all nominal predictors into dummy variables, and normalize all numeric predictors.
# create recipe
snap_rec <-
recipe(formula = snap_participation_rate ~ ., data = snap_train) |>
step_zv(all_predictors()) |>
update_role(GEOID, new_role = "id variable") |>
step_mutate(bbce_inclmt = if_else(bbce_inclmt == -9, 130, bbce_inclmt),
bbce_asset = case_when(
bbce_asset == 1 ~ 2,
bbce_asset == 0 ~ 1,
bbce_asset == -9 ~ 0
),
bbce_asset = factor(bbce_asset),
abawdwaiver = factor(abawdwaiver)) |>
step_normalize(all_numeric_predictors(), -c(snap_rate_acs, rural_urban_score, governor_is_democrat_1_yes, bbce)) |>
step_dummy(all_nominal_predictors())Linear Regression Elastic Net
Our first model to test is an elastic net. We tune the penalty hyperparameter to find the value that minimizes RMSE.
lm_spec <- linear_reg(penalty = tune(),
mixture = .5) |>
set_engine("glmnet") |>
set_mode("regression")
lm_grid <- grid_regular(penalty(), levels = 10)
lm_wf <- workflow() |>
add_recipe(snap_rec) |>
add_model(lm_spec)
lm_res <- lm_wf |>
tune_grid(resamples = snap_folds,
grid = lm_grid,
metrics = metric_set(yardstick::rmse, yardstick::mae))
lm_res |>
collect_metrics()# A tibble: 20 × 7
penalty .metric .estimator mean n std_err .config
<dbl> <chr> <chr> <dbl> <int> <dbl> <chr>
1 0.0000000001 mae standard 0.0317 1421 0.0000311 Preprocessor1_Model…
2 0.0000000001 rmse standard 0.0423 1421 0.0000327 Preprocessor1_Model…
3 0.00000000129 mae standard 0.0317 1421 0.0000311 Preprocessor1_Model…
4 0.00000000129 rmse standard 0.0423 1421 0.0000327 Preprocessor1_Model…
5 0.0000000167 mae standard 0.0317 1421 0.0000311 Preprocessor1_Model…
6 0.0000000167 rmse standard 0.0423 1421 0.0000327 Preprocessor1_Model…
7 0.000000215 mae standard 0.0317 1421 0.0000311 Preprocessor1_Model…
8 0.000000215 rmse standard 0.0423 1421 0.0000327 Preprocessor1_Model…
9 0.00000278 mae standard 0.0317 1421 0.0000311 Preprocessor1_Model…
10 0.00000278 rmse standard 0.0423 1421 0.0000327 Preprocessor1_Model…
11 0.0000359 mae standard 0.0317 1421 0.0000311 Preprocessor1_Model…
12 0.0000359 rmse standard 0.0423 1421 0.0000327 Preprocessor1_Model…
13 0.000464 mae standard 0.0318 1421 0.0000270 Preprocessor1_Model…
14 0.000464 rmse standard 0.0424 1421 0.0000284 Preprocessor1_Model…
15 0.00599 mae standard 0.0331 1421 0.0000104 Preprocessor1_Model…
16 0.00599 rmse standard 0.0443 1421 0.00000899 Preprocessor1_Model…
17 0.0774 mae standard 0.0559 1421 0.00000806 Preprocessor1_Model…
18 0.0774 rmse standard 0.0708 1421 0.0000182 Preprocessor1_Model…
19 1 mae standard 0.0709 1421 0.0000215 Preprocessor1_Model…
20 1 rmse standard 0.0898 1421 0.0000428 Preprocessor1_Model…
lm_res |>
autoplot()K-Nearest Neighbor
Our next model is K-Nearest Neighbors. Here, we tune the model to select the number of neighbors that minimizes RMSE.
knn_spec <- nearest_neighbor(neighbors = tune()) |>
set_engine(engine = "kknn") |>
set_mode(mode = "regression")
knn_wf <- workflow() |>
add_model(spec = knn_spec) |>
add_recipe(recipe = snap_rec)
knn_grid <- grid_regular(neighbors(range = c(1, 15)), levels = 8)
knn_res <- knn_wf |>
tune_grid(resamples = snap_folds,
grid = knn_grid,
control = control_grid(save_pred = TRUE),
metrics = metric_set(yardstick::rmse, yardstick::rsq))
knn_res |>
collect_metrics()# A tibble: 16 × 7
neighbors .metric .estimator mean n std_err .config
<int> <chr> <chr> <dbl> <int> <dbl> <chr>
1 1 rmse standard 0.0348 1421 0.000261 Preprocessor1_Model1
2 1 rsq standard 0.875 1421 0.00255 Preprocessor1_Model1
3 3 rmse standard 0.0331 1421 0.000207 Preprocessor1_Model2
4 3 rsq standard 0.875 1421 0.00204 Preprocessor1_Model2
5 5 rmse standard 0.0351 1421 0.000172 Preprocessor1_Model3
6 5 rsq standard 0.858 1421 0.00174 Preprocessor1_Model3
7 7 rmse standard 0.0364 1421 0.000150 Preprocessor1_Model4
8 7 rsq standard 0.848 1421 0.00150 Preprocessor1_Model4
9 9 rmse standard 0.0375 1421 0.000136 Preprocessor1_Model5
10 9 rsq standard 0.839 1421 0.00132 Preprocessor1_Model5
11 11 rmse standard 0.0386 1421 0.000128 Preprocessor1_Model6
12 11 rsq standard 0.832 1421 0.00119 Preprocessor1_Model6
13 13 rmse standard 0.0396 1421 0.000121 Preprocessor1_Model7
14 13 rsq standard 0.824 1421 0.00108 Preprocessor1_Model7
15 15 rmse standard 0.0405 1421 0.000116 Preprocessor1_Model8
16 15 rsq standard 0.818 1421 0.000995 Preprocessor1_Model8
knn_res |>
autoplot()knn_best <- knn_res |>
select_best(metric = "rmse")Random Forest
Our final model is a random forest.
set.seed(05072025)
rf_spec <- rand_forest(
trees = 100,
min_n = 5) |>
set_mode("regression") |>
set_engine("ranger")
rf_wf <- workflow() |>
add_model(rf_spec) |>
add_recipe(snap_rec)
rf_res <- rf_wf |>
fit_resamples(resamples = snap_folds)
rf_res |>
collect_metrics()# A tibble: 2 × 6
.metric .estimator mean n std_err .config
<chr> <chr> <dbl> <int> <dbl> <chr>
1 rmse standard 0.0382 1421 0.0000210 Preprocessor1_Model1
2 rsq standard 0.843 1421 0.000269 Preprocessor1_Model1
Testing Final Model
The K-Nearest Neighbors model with three neighbors was the best performing model, with an RMSE of about 3.3 percentage points. So we move forward and fit that model on data from 2019 and test it on 2023.
knn_best <- knn_res |>
select_best(metric = "rmse")
final_fit <- knn_wf |>
finalize_workflow(parameters = knn_best) |>
fit(data = snap_2019)
predictions <- bind_cols(
snap_test,
predict(object = final_fit, new_data = snap_test))
select(predictions, snap_participation_rate, .pred)# A tibble: 1,420 × 2
snap_participation_rate .pred
<dbl> <dbl>
1 0.162 0.126
2 0.109 0.0985
3 0.328 0.268
4 0.231 0.207
5 0.152 0.134
6 0.383 0.300
7 0.360 0.311
8 0.243 0.213
9 0.264 0.253
10 0.185 0.180
# ℹ 1,410 more rows
rmse(predictions$snap_participation_rate, predictions$.pred)[1] 0.05624878
summary(snap_test$snap_participation_rate) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1278 0.1826 0.2006 0.2588 0.6196
Discussion of Results
Error Rates
When implemented on the testing data, the RMSE was 5.62 percentage points, which is significantly higher than the average RMSE of the best model during cross-validation. Compared to the average SNAP participation rate across sampled counties in 2023 of 20.06%, this is a very high error rate, about 28.02%. Given the magnitude of the RMSE, we would not recommend the implementation of this model for making real-world predictions. The error rate also varied across states and across different levels of urbanicity and racial make-up. The first bar graph below shows the error rates by state, which vary dramatically. Texas has by far the highest RMSE at over 12 percentage points, and Washington D.C. has the lowest. State size appears to be a factor contributing to this; states with fewer counties seem to have lower error rates than states with many counties (D.C. only has one, while Texas has 254).
predictions |>
mutate(state_name = case_when(state_fips == "01" ~ "Alabama",
state_fips == "05" ~ "Arkansas",
state_fips == "10" ~ "Delaware",
state_fips == "11" ~ "District of Columbia",
state_fips == "12" ~ "Florida",
state_fips == "13" ~ "Georgia",
state_fips == "21" ~ "Kentucky",
state_fips == "22" ~ "Louisiana",
state_fips == "24" ~ "Maryland",
state_fips == "28" ~ "Mississippi",
state_fips == "37" ~ "North Carolina",
state_fips == "40" ~ "Oklahoma",
state_fips == "45" ~ "South Carolina",
state_fips == "47" ~ "Tennessee",
state_fips == "48" ~ "Texas",
state_fips == "51" ~ "Virginia",
state_fips == "54" ~ "West Virginia")) |>
group_by(state_fips) |>
mutate(state_rmse = rmse(snap_participation_rate, .pred)) |>
ggplot() +
geom_col(mapping = aes(x = state_name, y = state_rmse)) +
theme(axis.text.x = element_text(angle = 45, vjust = .85)) +
labs(x = "State", y = "RMSE", title = "RMSE by State")The second bar graph shows error rates by urbanicity (with 1 being most urban and 9 being the most rural). It appears that counties at either end of the urban/rural spectrum had the highest RMSE, while counties in the middle had the lowest.
predictions |>
group_by(rural_urban_score) |>
mutate(rural_urban_rmse = rmse(snap_participation_rate, .pred)) |>
ggplot() +
geom_col(mapping = aes(x = rural_urban_score, y = rural_urban_rmse)) +
theme(axis.text.x = element_text(angle = 45, vjust = .85)) +
labs(x = "Urban-Rural Score", y = "RMSE", title = "RMSE by Urbanicity")The scatter plot and regression line below display the relationship between absolute error and county racial make-up. Counties with a higher non-white proportion of the population had slightly higher error rates.
predictions |>
mutate(abs_error = abs(.pred - snap_participation_rate)) |>
ggplot(mapping = aes(x = white_pct, y = abs_error)) +
geom_point() +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth") +
labs(x = "Proportion of the Population that is White", y = "Error", title = "Error by County Racial Demographics")Differential error rates across different sectors of the population are crucial for researchers policymakers to consider when evaluating a model, as they may have ramifications on equity if policy decisions are made using the outputs of the model.
Limitations
The primary limitations of this project were time and computing power. Only having access to our personal laptops, we were not able to use samples as large or models as complex as we would have preferred. This likely contributed to the magnitude of the error rate, and potentially to the differential error rates, as we did not have sample sizes large enough to make accurate predictions for certain subsets of the population.
Bibliography
Creamer, J., & King, M. D. (2024, November 14). New interactive data tool shows how programs and expenses affect poverty measurement. U.S. Census Bureau. https://www.census.gov/library/stories/2024/11/supplemental-poverty-measure-visualization.html
Dickert-Conlin, S., Fitzpatrick, K., Stacy, B., & Tiehen, L. (2021). The downs and ups of the SNAP caseload: What matters? Applied Economic Perspectives and Policy, 43(3), 1026–1050.
Han, J. (2016). The impact of SNAP on material hardships: Evidence from broad-based categorical eligibility expansions. Southern Economic Journal, 83(2), 464–486.
Jones, J., Courtemanche, C., Denteh, A., Marton, J., & Tchernis, R. (2021). Do state SNAP policies influence program participation among seniors? IZA Discussion Paper Series, No. 14564. Institute of Labor Economics (IZA).
Pinard, C. A., Bertmann, F. M. W., Byker Shanks, C., Scharadin, B., & Yaroch, A. L. (2017). What factors influence SNAP participation? Literature reflecting enrollment in food assistance programs from a social and behavioral science perspective. Journal of Hunger & Environmental Nutrition, 12(2), 151–168.
Rothbaum, J., Fox, L., & Shantz, K. (2021, October). Fixing errors in a SNAP: Addressing SNAP under-reporting to evaluate poverty. U.S. Census Bureau.
U.S. Department of Agriculture, Food and Nutrition Service. (n.d.a). Broad-based categorical eligibility. https://www.fns.usda.gov/snap/broad-based-categorical-eligibility
U.S. Department of Agriculture, Food and Nutrition Service. (n.d.b). ABAWD Waivers. https://www.fns.usda.gov/snap/abawd/waivers